Search CORE

496 research outputs found

Détection et résolution d'entités nommées dans des dépêches d'agence

Author: Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 19/07/2010
Field of study

International audienceNous présentons NP, un système de reconnaissance d'entités nommées. Comprenant un module de résolution, il permet d'associer à chaque occurrence d'entité le référent qu'elle désigne parmi les entrées d'un référentiel dédié. NP apporte ainsi des informations pertinentes pour l'exploitation de l'extraction d'entités nommées en contexte applicatif. Ce système fait l'objet d'une évaluation grâce au développement d'un corpus annoté manuellement et adapté aux tâches de détection et de résolution

INRIA a CCSD electronic archive server

Hal-Diderot

Analyse discursive des incises de citation

Author: Danlos Laurence
Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceCet article présente une analyse complète des discours constitués d'une citation et d'une incise de citation, dont la tête est un verbe dit verbe de citation. Nous avons mené à bien cette étude grâce à un corpus de dépêches de l'AFP, dans lequel ce type de constructions est particulièrement fréquent. Nous identifions trois classes de verbes de citation : des verbes transitifs de discours rapporté (dire, déclarer), des verbes intransitifs (plaisanter, fulminer) et des verbes transitifs qui ne sont pas des verbes de discours rapporté (interrompre, commenter, continuer), ces derniers étant souvent ignorés. Nous montrons que la relation entre citation et incise de citation n'est pas purement phrastique, et notamment que la relation sémantique entre les deux ne peut être restituée qu'au niveau discursif, en prenant en compte le contexte discursif gauche de la phrase concernée. Ces résultats nous ont conduit à proposer une modélisation lexicale des verbes de citation au sein du lexique syntaxique Lefff , aﬁn de permettre la prise en compte de ces constructions dans les analyseurs syntaxiques automatiques. Ils nous ont également amené à proposer une analyse discursive détaillée de la relation entre citation et incise de citation, où le verbe de citation est associé à un cadre de sous-catégorisation discursif distinct de son cadre de sous-catégorisation phrastique. Il s'agit d'une première étape vers une prise en compte satisfaisante de ces constructions dans un analyseur syntaxique automatique, et de son extension vers le niveau discursif

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

INRIA a CCSD electronic archive server

Hal-Diderot

Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées

Author: Béchet Frédéric
Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 27/06/2011
Field of study

International audienceNamed entity recognition and typing is achieved both by symbolic and probabilistic systems. We report on an experiment for making the rule-based system NP, a high-precision system developed on AFP news corpora and relies on the Aleda named entity database, interact with LIANE, a high-recall probabilistic system trained on oral transcriptions from the ESTER corpus. We show that a probabilistic system such as LIANE can be adapted to a new type of corpus in a non-supervized way thanks to large-scale corpora automatically annotated by NP. This adaptation does not require any additional manual anotation and illustrates the complementarity between numeric and symbolic techniques for tackling linguistic tasks.La détection et le typage des entités nommées sont des tâches pour lesquelles ont étéd éveloppés à la fois des systèmes symboliques et probabilistes. Nous présentons les résultats d'une expérience visant à faire interagir le système à base de règles NP, développé sur des corpus provenant de l'AFP, intégrant la base d'entités Aleda et qui a une bonne précision, et le système LIANE, entraîné sur des transcriptions de l'oral provenant du corpus ESTER et qui a un bon rappel. Nous montrons qu'on peut adapter à un nouveau type de corpus, de manière non supervisée, un système probabiliste tel que LIANE grâce à des corpus volumineux annotés automatiquement par NP. Cette adaptation ne nécessite aucune annotation manuelle supplémentaire et illustre la complémentarité des méthodes numériques et symboliques pour la résolution de tâches linguistiques

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

Combining Differential Kinematics and Optical Flow for Automatic Labeling of Continuum Robots in Minimally Invasive Surgery

Author: Bordoux Valentin
Nageotte Florent
Rosa Benoît
Publication venue: 'Frontiers Media SA'
Publication date: 06/09/2019
Field of study

International audienceThe segmentation of continuum robots in medical images can be of interest for analyzing surgical procedures or for controlling them. However, the automatic segmentation of continuous and flexible shapes is not an easy task. On one hand conventional approaches are not adapted to the specificities of these instruments, such as imprecise kinematic models, and on the other hand techniques based on deep-learning showed interesting capabilities but need many manually labeled images. In this article we propose a novel approach for segmenting continuum robots on endoscopic images, which requires no prior on the instrument visual appearance and no manual annotation of images. The method relies on the use of the combination of kinematic models and differential kinematic models of the robot and the analysis of optical flow in the images. A cost function aggregating information from the acquired image, from optical flow and from robot encoders is optimized using particle swarm optimization and provides estimated parameters of the pose of the continuum instrument and a mask defining the instrument in the image. In addition a temporal consistency is assessed in order to improve stochastic optimization and reject outliers. The proposed approach has been tested for the robotic instruments of a flexible endoscopy platform both for benchtop acquisitions and an in vivo video. The results show the ability of the technique to correctly segment the instruments without a prior, and in challenging conditions. The obtained segmentation can be used for several applications, for instance for providing automatic labels for machine learning techniques

INRIA a CCSD electronic archive server

Population of a Knowledge Base for News Metadata from Unstructured Text and Web Data

Author: Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 07/06/2012
Field of study

International audienceWe present a practical use case of knowl- edge base (KB) population at the French news agency AFP. The target KB instances are en- tities relevant for news production and con- tent enrichment. In order to acquire uniquely identified entities over news wires, i.e. tex- tual data, and integrate the resulting KB in the Linked Data framework, a series of data mod- els need to be aligned: Web data resources are harvested for creating a wide coverage entity database, which is in turn used to link entities to their mentions in French news wires. Fi- nally, the extracted entities are selected for in- stantiation in the target KB. We describe our methodology along with the resources created and used for the target KB population

INRIA a CCSD electronic archive server

Hal-Diderot

Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées

Author: Richard Marion
Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 01/06/2012
Field of study

National audienceThe French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations.Le Corpus Arboré de Paris 7 (ou French TreeBank) est le corpus de référence pour le français aux niveaux morphosyntaxique et syntaxique. Toutefois, il ne contient pas d'annotations explicites en entités nommées. Ces dernières sont pourtant parmi les informations les plus utiles pour de nombreuses tâches en traitement automatique des langues et de nombreuses applications. De plus, aucun corpus du français annoté en entités nommées et de taille importante ne contient d'annotation référentielle, qui complète les informations de typage et d'empan sur chaque mention par l'indication de l'entité à laquelle elle réfère. Nous avons annoté manuellement avec ce type d'informations, après pré-annotation automatique, le Corpus Arboré de Paris 7. Nous décrivons les grandes lignes du guide d'annotation sous-jacent et nous donnons quelques informations quantitatives sur les annotations obtenues

INRIA a CCSD electronic archive server

Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées

Author: Richard Marion
Sagot Benoît
Stern Rosa
Publication venue: HAL CCSD
Publication date: 01/06/2012
Field of study

INRIA a CCSD electronic archive server

Traitement des inconnus : une approche systématique de l'incomplétude lexicale

Author: Blancafort San José Helena
Couto Javier
Recourcé Gaëlle
Sagot Benoît
Stern Rosa
Teyssou Denis
Publication venue: HAL CCSD
Publication date: 19/07/2010
Field of study

International audienceCet article aborde le phénomène de l'incomplétude des ressources lexicales, c'est-à-dire la problématique des inconnus, dans un contexte de traitement automatique. Nous proposons tout d'abord une définition opérationnelle de la notion d'inconnu. Nous décrivons ensuite une typologie des différentes classes d'inconnus, motivée par des considérations linguistiques et applicatives ainsi que par l'annotation des inconnus d'un petit corpus selon notre typologie. Cette typologie sera mise en œuvre et validée par l'annotation d'un corpus important de l'Agence France-Presse dans le cadre du projet EDyLex

INRIA a CCSD electronic archive server

Hal-Diderot

Mayaro Virus in Wild Mammals, French Guiana

Author: Benoît de Thoisy
Bowen
de Thoisy
Fandeur
Fischer-Tenhagen
Haas
Hoch
Jacques Gardon
Jacques Morvan
Lindsey
Mirdad Kazanji
Monath
Rosa Alba Salas
Scott
Seymour
Spalding
Talarmin
Tesh
Vasconcelos
Vié
Vogel
Walder
Publication venue: Centers for Disease Control and Prevention
Publication date: 01/10/2003
Field of study

A serologic survey for Mayaro virus (Alphavirus, Togaviridae) in 28 wild nonflying forest mammal species in French Guiana showed a prevalence ranging from 0% to 52% and increasing with age. Species active during the day and those who spent time in trees were significantly more infected, results consistent with transmission implicating diurnal mosquitoes and continuous infectious pressure

Crossref

Directory of Open Access Journals

PubMed Central